6 research outputs found

    SODA: Generating SQL for Business Users

    Full text link
    The purpose of data warehouses is to enable business analysts to make better decisions. Over the years the technology has matured and data warehouses have become extremely successful. As a consequence, more and more data has been added to the data warehouses and their schemas have become increasingly complex. These systems still work great in order to generate pre-canned reports. However, with their current complexity, they tend to be a poor match for non tech-savvy business analysts who need answers to ad-hoc queries that were not anticipated. This paper describes the design, implementation, and experience of the SODA system (Search over DAta Warehouse). SODA bridges the gap between the business needs of analysts and the technical complexity of current data warehouses. SODA enables a Google-like search experience for data warehouses by taking keyword queries of business users and automatically generating executable SQL. The key idea is to use a graph pattern matching algorithm that uses the metadata model of the data warehouse. Our results with real data from a global player in the financial services industry show that SODA produces queries with high precision and recall, and makes it much easier for business users to interactively explore highly-complex data warehouses.Comment: VLDB201

    My Private Google Calendar

    No full text
    Everybody loves Google Apps. Google provides highly available web applications that help you communicate, organize and collaborate from anywhere using different interfaces in the most user friendly and efficient way, without being worried about any IT issues. However, some people still hesitate using Google services because of privacy and trust issues. In this paper, we identify privacy issues in GoogleWeb Applications as a particularly vital problem and propose a solution. In our solution a transparent encryption layer is put between the user and the cloud service provider on a site trusted by the user. This layer accesses the request and response messages passed between the two parties in a fine-grained manner. It applies modern cryptography techniques to encrypt the data without sacrificing functionality and portability of the cloud service. This way the trust of the end user can be reobtained and he or she will be encouraged to further enjoy using web applications such as Google Apps without having to worry about privacy issues

    Mison: A Fast JSON Parser for Data Analytics

    No full text
    The growing popularity of the JSON format has fueled increased interest in loading and processing JSON data within analytical data processing systems. However, in many applications, JSON pars- ing dominates performance and cost. In this paper, we present a new JSON parser called Mison that is particularly tailored to this class of applications, by pushing down both projection and filter operators of analytical queries into the parser. To achieve these features, we propose to deviate from the traditional approach of building parsers using finite state machines (FSMs). Instead, we follow a two-level approach that enables the parser to jump di- rectly to the correct position of a queried field without having to perform expensive tokenizing steps to find the field. At the upper level, Mison speculatively predicts the logical locations of queried fields based on previously seen patterns in a dataset. At the lower level, Mison builds structural indices on JSON data to map logi- cal locations to physical locations. Unlike all existing FSM-based parsers, building structural indices converts control flow into data flow, thereby largely eliminating inherently unpredictable branches in the program and exploiting the parallelism available in modern processors. We experimentally evaluate Mison using representative real-world JSON datasets and the TPC-H benchmark, and show that Mison produces significant performance benefits over the best existing JSON parsers; in some cases, the performance improve- ment is over one order of magnitude

    Information Management in the Cloud (Dagstuhl Seminar 11321)

    No full text
    Cloud computing is emerging as a new paradigm for highly scalable, fault-tolerant, and adaptable computing on large clusters of off-the-shelf computers. Cloud architectures strive to massively parallelize complex processing tasks through a computational model motivated by functional programming. They provide highly available storage and compute capacity through distribution and redundancy. Most importantly, Cloud architectures adapt to changing requirements by dynamically provisioning new (virtualized) compute or storage nodes. Economies of scale enable cloud providers to provide compute and storage powers to a multitude of users. On the infrastructure side, such a model has been pioneered by Amazon with EC2, whereas software as a service on cloud infrastructures with multi-tenancy has been pioneered by Salesforce.com. The Dagstuhl Seminar 11321 ``Information Management in the Cloud\u27\u27 brought together a diverse set of researchers and practitioners with a broad range of expertise. The purpose of this seminar was to consider and to discuss causes, opportunities, and solutions for technologies, and architectures that enable cloud information management. The scope ranged from web-scale log file analysis using cluster computing techniques to dynamic provisioning of resources in data centers, covering topics from the areas of analytical and transactional processing, parallelization of large scale data and compute intensive operations as well as implementation techniques for fault tolerance

    Data Engineering

    No full text
    As network connectivity has continued its explosive growth and as storage devices have become smaller, faster, and less expensive, the number of online digitized images has increased rapidly. Successful queries on large, heterogeneous image collections cannot rely on the use of text matching alone. In this paper we describe how we use image analysis in conjunction with an object relational database to provide both textual and content-based queries on a very large collection of digital images. We discuss the effects of feature computation, retrieval speed, and development issues on our feature storage strategy. 1 Introduction A recent search of the World Wide Web found 16 million pages containing the word "gif " and 3.2 million containing "jpeg" or "jpg." Many of these images have little or no associated text, and what text they do have is completely unstructured. Similarly, commercial image databases may contain hundreds of thousands of images with little useful text. To fully utilize..

    Associate Editors

    No full text
    is published quarterly and is distributed to all TC members. Its scope includes the design, implementation, modelling, theory and application of database systems and their technology. Letters, conference information, and news should be sent to the Editor-in-Chief. Papers for each issue are solicited by and should be sent to the Associate Editor responsible for the issue. Opinions expressed in contributions are those of the authors and do not necessarily reflect the positions of the T
    corecore